GENERALIZED ARITHMETIC AND GEOMETRIC MEAN 
DIVERGENCE MEASURE AND THEIR STATISTICAL ASPECTS 



INDER JEET TANEJA 



Abstract. Using Blackwell's definition of comparing two experiments, a comparison is 
made with generalized AG - divergence measure having one and two scalar parameters. 
Connection of generalized AG - divergence measure with Fisher measure of information 
is also presented. A unified generalization of AG - divergence and Jensen-Shannon 
divergence measures is also presented. 



1. Introduction 

Several measures have been introduced in the literature on information theory and 
statistics as measures of information. The most famous in the literature of statistics is 
Fisher jS] measure of information. It measures the amount of information supplied by 
data about an unknown parameter 9. The most commonly used in information theory is 
the Shannon ^Hj entropy. It gives the amount of uncertainty concerning the outcome of 
an experiment. Kullback and Leibler introduced a measure associated with two dis- 
tributions of an experiment. It expresses the amount of information supplied by the data 
for discriminating among the distribution. As a symmetric measure, Jeffreys-Kullback- 
Leibler J-divergence is commonly used. Renyi jTHJ generalized both Shannon entropy and 
Kullback-Leibler relative information by introducing a scalar parameter. Burbea and 
Rao 0], jH] and Taneja [21], j22j have proposed various alternative ways to generalize the 
Jeffreys-Kullback-Leibler J-divergence. The proposed measures of Burbe and Rao jl], [S] 
involve one parameter. Measures proposed by author [22] involve two scalar parameters. 

Let Sx = {X, Sx, Pe\ $ £ @} denote a statistical experiment in which a random vari- 
able or random vector X defined on some sample space Sx is to be observed and the 
distribution Pg of X depends on the parameter 9 whose values are unknown and lie in 
some parameter space B. We shall assume that there exists a generalized probability 
density function p(x\9) for the distribution Pg with respect to a— finite measure fi. Let 
also 5 denote the class of all prior distributions on the parameter space 0. Given a prior 
distribution £ e S, let p(x) denote the corresponding marginal generalized probability 
density function (gpdf) 



p(x) = / p{x\9)di(9). 
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Similarly, if we have two prior distributions £1, £2 G S, the corresponding marginal 
gpdf's are 

Pi {x) = [ p(x\9)dZi(9),i = 1,2. 

In this context, the relative information, the J -divergence, the Jensen-Shannon diver- 
gence, and the arithmetic and geometric mean divergence measures are given as follows: 



Relative information (Kullback and Leibler |13j ) 



(1) 



K(Z 1 ;£ 2 \\X) = J Pl (x) In 



p 2 (x) 



dfM. 



(2) 



J — divergence (Jeffreys [TT], Kullback and Leibler |13| ) 

J(6i6IW = /i.WJ~(|g)*.+ / R (x)h(|g)*- 



• Jensen— Shannon divergence (Sibson ^H], Burbea and Rao [I], [E]) 

(3) J(£ i; £ 2 ||X) = ^ J [ Pl (x)\n Pl (x)+p 2 (x)\np 2 (x)] 



(4) 



AG — divergence (Taneja [2~2]) 



In I P1 X)+P2(X) ] d„. 

2y/p 1 (x)p 2 (x) 



The three divergence measures given above can be written in terms of Kullback-Leibler 
relative information as 



(5) 
(6) 

and 

(7) 



J(Z 1] S 2 \\X) = K(Z 1] S 2 \\X) + K(&S 1 \\X) 

1 



m-^2\\x) 



Ki^^-^WX) + K(^±^^ 2 \\X) 



2 7 " in ' x 2 

Moreover we have the following equality holding among the three divergence measures 



i{iYM\^)^ni 2 M\x) = -^j{i v M\x). 
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Recently, author [2IJ proved an interesting inequality among these three divergence 
measures: 



(9) 



where all the probability distributions involved are positive. 
In view of (JB1) the inequalities © can be extended as follows: 



(10) 



Based on above notations, the Csiszar [6 (ft— divergence is given by 



'11^ 



C^.,:(,||.Y) / p 2 ( x )<f>(P^ ),/,,. 

'X \P2{X) 



where the function is arbitrary convex function defined in the interval (0, oo). In order 
to avoid meaningless expressions, the functions satisfy some conventional conditions 
given in j^j. 

In this paper we shall present two parameter generalizations of the AG - divergence. 
Also we shall present one parametric unified generalization of the measures (J3J) and (J1J). 
For two parametric generalization of the measures (j2J) and (0) refer to Taneja [22]- Also 
refer on line book by author [23! • Here, in this paper we shall make connections of gen- 
eralized AG - divergence measures with Fisher measure of information. The comparison 
of experiments is also studied applying Blackwell's [2] approach. 

2. Unified (r, s)— Arithmetic and Geometric Mean Divergence Measures 

In this section, we shall present two different ways of generalizing the AG - divergence 
measure (J3J). Before it we shall give two parametric unified (r, s)— generalization [22] of 
the relative information: 



(12) 



K{ixM\x) 



KZfa&WX), r^l, s^l 

Klfa&WX), r = l, s±l 

K}(&Z 2 \\X), rjLl, s = l 

(Kfa&WX), r = l, 8 = 1 



for all r > and — oo < s < oo, where 



K^^X) = (s - 1] 



Pi(x) r p 2 (x) 1 r d^L 



,r^l,s^l 



and 



mvMX) = (s- I)" 1 [e^W^IW - 1] , s± 1 
K^-^WX) = (r - l^ln (| p 1 (x) r p 2 (x) 1 - r d^ , r ± 1, 1. 
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2.1. First Generalizations. In (0) replace K(£i;£ 2 \\X) by /C'fo; £ 2 ||X), w e get 



(13) 



1 ^(ei;6IW = 2 



^(6; &IP0, r^l, s^l 
^(fr&WX), r = l, s^l 

XmX 



|T(£ i; £ 2 ||X), r = l, s = l 



where 



1 ^(ei;6l|Jf) = ( a -i)- 1 ; 



Pi (a?) +P2(a^) 



3-1 

r-1 



+ 



p 2 (^) r d/i 



s-1 
r-1 



r ^ 1, s ^ 1 



1 



l T^ i; 6| |X) = - I)" 1 e (-iW^ilW + e (.-D*(^;&ll*) _ 2 ^ 2 
2 - 



/«l+«2.. 



and 



2?(£i;6ll*) = (r-l) 



+ In 

for all r > and — oo < s < oo 



r ^ 1 



p 2 (x) r dfi 



2.2. Second Generalizations. In particular, when r = s in (J13)) . we get 

(14) 1 3?(e 1 ;6IW=( a -l)- 1 . 

'j)l(x) +P2(x)\" ( Pl(x) 1 - 3 +p 2 (x) 1 - 



djJL — l 



for all s ^ 1, s > 0. 

We shall use the expression ()14p to give the alternative generalizations of AG - diver- 
gence measure. This unified way is given by 



(15) 



%'faf 3 \\X) = { 



2 t;(Cx;^\\x), r±\,s±\ 

2 Tf(6;£ 2 ||X), r = l, s^l 

2 T r \tii;&\\X), 1,8 = 1 

[T(a;6iP0, r=l, s = l 
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for all r > and — oo < s < oo, where 

2 ^i;6IW = («-i)- 1 - 



Pi(x) + p 2 (x)\ r ( pi(x) 1 r + p 2 (x) 1 r i , 



s-1 
r-1 



1 



1 



2^(6; 61 \X) = (s - l)- 1 [ e (s-W-Mx) _ !] 

and 

2 T r 1 (ei;6IW = (r-l)- 1 - 



for all r > 0, r 7^ 1, s 7^ 1. 
In particular, we have 



1 7;*(e 1 ;6l|x) = 2 r/(6;6IW- 



2.3. Composition Relations. We observe that the measures a T r s £ 2 \\X) (a = 1, 
are continuous with respect to the parameters r and s. This allows us to write them 
the following simplified way 

(16) a T r s (^^ 2 \\X) = CE{ a T r s (^^ 2 \\X) \r > 0, r ^ 1, s ? 1} , a = 1,2, 

where "CE" stands for " continuous extension" with respect to r and s. 
Also we can write 

(17) 1 T r s ^ 1 ^ 2 \\X)=K( 1 T r \^^ 2 \\X)) 
and 

(is) 2 r/(6;6l|x) = M s (k\ (/c; , 

where jV s : (0, oo) — > M(reals) is given by 



(19) AC(x) = 



(s - l)' 1 [e^ x - 1] , s^l 

X, s = 1 



Proposition 1. TTie measure M s (x) given above has the following properties: 

(i) M s (x) ^ with equality iff x = 0; 

(ii) M s (x) is an increasing function of x; 
(hi) M s (x) is an increasing function of s; 

(iv) M s (x) is strictly convex function of x for s > 1; 

(v) M s (x) is strictly concave function of x for s < 1. 
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2.4. Alternative Generalizations. We see that the measure (J 14)) is considered for 
s > 0. It is required for the non-negativity of the measure. We can rewrite it in little 
different way, where we don't require this condition. This form is given by 

(20) IT a (Z 1 ;£ 3 \\X)= [sis-l)]" 1 - 

'pt{x) + p 2 (x)\ S f Pl(xY~ s +p 2 (x) 1 ~ s \ , 

dfi — 1 



where s ^ 0,1 

The measure (120 jl admits the following limiting cases: 



]imIT s ^ 1 ^2\\X) = I(^^ 2 \\X) 

s— >0 



and 



)imIT a (Z l] Z 2 \\X) = T{Z 1 ;Z 2 \\X), 

s—*l 

where I(£i,£ 2 \\X) and T(£i, £ 2 \\X) are as given by (J3J) and (jH) respectively. 

In view of these limiting cases, we re-write the measure (120)) in the following unified 

way 

'lT a (frZ 2 \\X), 8^0,1 

(2D zr.{ti'Mx) = {m;b\\x), s = o 

3. Relationship with Csiszar </>- Divergence 

We can relate the above generalizations of the AG - divergence measure with the well 
known Csiszar <p— divergence. It is given as follows: 



and 
where 



Vs{y) = 



^ _ 1)-1 [yS-l _ !] ; 

y, s = i 

and £ 2 \\X) , 0*(£i> £2!!^) and <f>~ (£1, £ 2 \\X) are the <fi— divergences of £i, £ 2 in the 
notations of Vajda with 

<f>(x) = 



and 



1 + x 



4>*{x) = X(f) 



<f>~{x) = -[(p{x) + (f)*{x)}. 
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We can also write the measure (J21|) in terms of Csiszar (p— divergence as follows: 

(22) JT s (a;£ 2 ||X) = / p 2 {x)4>jt s (^Va*, 
where 

f [2a( fl - I)]" 1 [(x 1 - + 1) (^Y - (x + 1)] , a ^ 0, 1 

(23) 0xr s (x) = iflnx+(2±i)ln(4 T ), , = , 

for all x £ (0, oo). 

Proposition 2. For all r > and — oo < s < oo, we /mue 
(i) «T/(£ i; 6|LY)^0(e* = l ; 2); 

^ 2 r r s (ei;6l|x), Or 
(iii) xr s (6;6IW >o- 



(ii) ^(a^H*) 



4. Divergence Measures and Sufficiency of Experiments 

Blackwell [2| definition of comparison of experiments states that experiment Sx is suf- 
ficient for experiment Sy, denoted by Sx Sy, if there exists a stochastic transformation 
of X to a random variable Z(X) such that for each 9 £ the random variable Z(X) 
and F have identical distributions. By £y = {Y, Sy, Qe] £ 9} we shall denote a second 
statistical experiment for which there exists a gpdf g(y\6) for the distribution Q with 
respect to a cr— finite measure fj,. According to this definition, if Sx Sy, then there 
exists a nonnegative function h satisfying (DeGroot [Zj) 



(24) g(y\6) = / h(y\x)f(x\9)dfi 

Jx 

and 

/ h(y\x)dv = 1. 
Jx 

Changing the order of integration in (|24p. we get 

(25) gt(y)= [ h(y\x)f l (x)d^ % = 1, 2. 



.Y 



Let / be any measure of information contained in an experiment. If Sx b £y implies 
that Ix ^ Iy, then we say that Sx is as informative as Sy. This approach is successfully 
carried out by Lindley ^1] for Shannon entropy. Goel and DeGroot ^HJ applied it for 
Kullback and Leibler ^Hj relative information. Ferentinos and Papaioannou [B] applied for 
a— order generalization of Kullback and Leibler relative information and generalizations 
of Fisher measure of information. Author [20J extended it to different generalizations of 
J-divergence measure having two scalar parameters. For the / - divergence measure and 
their two parametric generalizations refer to Taneja et al. |2H]. Here our aim is to compare 
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experiments for the unified (r,s)—AG - divergences given by (|13|) and (|15|). Results are 
also extended for the measure (12111. 



Theorem 1. If £ x h S Y , then a T r s fa&\\X) ^ a T r s fa&\\Y) (a = 1,2) for every 
£2 G S ; /or all r > and — oo < s < oo. 

Proof. Since b £y> there exists a function h satisfying (j^lj) and (|25J). then we can 
write 



(26) (9i(y) + 92(y)J 



9i(y) 



l-r 



mx) , m±m , d „ 



h(y\x)fi(x)dn 



x 



l-r 



Applying Holder's inequality on the right side of (|2EJ), we get 
(27) 

> J* [%!*) ( /t( '^ ( * ) )] , '[%lx)/i(x)] 1 - r dA*, 



< r < 1 



Hence 



(28) 



< /* [%!*) f /t(a!) 2 /i ' (x) )1 r [My|^)/i(^)] 1 " r ^ r > i 



< r < 1 



As sign (^rrj-) = sign(s — 1) for r > 1 and sign (^y) 7^ sign(s — 1) for < r < 1, 
where sign(x) — 1 if x > and sign(x) = — 1 if cc < 0, then from (}28|) one gets 



(29) 



s-1 



v l-r 



(^i(y)) . 

y \ 1 



(gi(y) + 92(y)J dv 



3-1 
r-1 



s-1 



(/i(x))1 -,/7i(x) + / 2 (x) 



.Y 



dfi 



3-1 

r-1 



for all r 7^ 1, s 7^ 1, r > 0. 
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Similarly, we can obtain 

oo) ^y^wr ( m+sM y 

for all r ^ 1, s ^ 1, r > 

Adding and (jSOj) . subtracting 2(s — (s ^ 1) and then dividing by 2, we get 

^(fc; & ||X) ^ ^(fc; 6||y), r ^ 1, a ? 1, r > 0. 

Since the unified measure 1 7^, s (^ 1 ; £ 2 | I AT) given in (JT3)l is a continuous extension of 
1 T 7 f(^ 1 ; ^H^) for the real parameters r and s we can immediately conclude that 

l V{^-^\\x)^ l r/(ei;6im^>o ; 

whenever £x ^ £y- 



Let us prove now the second part. Since Ex h Ey, there exist a function h satisfying 
(I24|) and (J25|) . then we can write 



(31) 



( 9i(y) + 92(y) Y f gi(yy- r + 92(y) 1 



l 

" 2 
+ - 



A' 



h(y\ x )fi( x ) d ^ 
h(y\x)f 2 (x)dfi 





2 


l-r 








n l-r 





fi(x) + f 2 (x) 



fi(x) + f 2 (x) 



dp 



U x 



dp 



(32) 



Applying Holder's inequality in (JBTj) . integrating over 3^, and using the fact that 

9i {y) + 92 (y) \ V 0i (i/) x ~ r + 92 (y) l ~ r 



Jy h(y\x)dv = 1, we get 



d/i, < r < 1 
d/i, r > 1 



As sign (^ry) = sign(s — 1) for r > 1 and sign (^rrj-) 7^ sign(s — 1) for < r < 1, we 
have 



(33) 



s- 1 



y 



9i(y) + 92(y) Y ( gi(y) l - r + 92{y) 



\l-r 



dv 



3-1 

r-l 



S - 1 



X 



fi(x) + f 2 (x)\ r ( h{xf- r + f 2 {x) 



\l-r 



dp 



3-1 

r-l 
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Subtracting (s — 1) 1 (s ^ 1) on both sides of we get 

2 T(£i,&lpO ^ 2 r(6,6||^), r ± 1, s ^ 1, r > 0, 
and consequently, we have 

a 2?(&,6||X)£ 2 r/(6,6iin^>o ; 

whenever £y- D 

Theorem 2. ^ tfjen ZT,(6; £ 2 ||X) ^ ^(6; 611*0 /or ™ery 6 G 5, /or 

a// s G (— oo, oo). 

The proof of the above theorem is based on the following lemmas. 

Lemma 1. (Joint convexity). If (p : (0, oo) — > K be convex, then C<^(6; 611^0 jointly 
convex for every £i, 6 G H. Moreover if (f)(1) = 0, then C^i, &\\X) ^ 0. 

Lemma 2. 7/£x ^ £y ; t/ien C^(£i; 6||X) ^ C</>(6; 611*0 / or every 6; 6 G S, provided 
(j) is convex. 

Lemma Qis due to Csiszar and Lemma |2] is due to Ferentinos and Papaioannou jH]. 

Proof, of Theorem QJ In view of Lemmas ^ and EJ it is sufficient to prove the convexity 
of the function (pjr s {x) given by ()18)) . It is in view of the following derivatives: 

2s(s-l) ' * T 1 u i 1 

(34) <&r» = < I [i_ ac -i_i lia; _21n( ; | T )] , s = 



and 



+ 1)( £ ±1)— 1 ! s ^ ,l 

(35) ^» = <H^)' S = ° 

1 s = l 



2k(x+1) ' 

Thus we have 4>xr s { x ) > f° r a ^ ^ > 0, and hence, 4>it s {x) is convex for all x > 0. 
Also, we have </>zt s (1) = 0. In view of this we can say that I- & T- divergence of type 
s given by (|21|) is nonnegative and convex in the pair of probability distributions P and 
Q. □ 

5. 0— Divergence and Fisher Information Matrix 

Consider a family M. = {Ve,9 G 0} of probability measures on a measurable space 
(X, A) dominated by a finite or a— finite measure /i. The parameter space can either 
be an open subset of the real line or an open subset of n— dimensional Euclidean space 
R k . Let f(x, 6) = ^. Let T = {f(x, 9) \x G X, 6 e }. 
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The Fisher |5] measure of information is given by 



(36) 



where 



\nf(x,9)^\nf(x,9) 



if 6 is univariate 
if 6 is k — variate 



fcxfc 



Ifcxfc 



denotes a k x k matrix and Eg denotes the expectation with respect to 
f(x,9), where f(x,9) G T. Let us suppose that the following regularity conditions are 
satisfied: 



(a) Jfi;f(x, 9) exists for all x G X, all 9 G 0, and all i = 1,2, k. 



(b) For any A G A, 



For f u f 2 e T, the Csiszar |6| 
(37) K^hWh 



divergence can be re-written as 
h 



with ^>(1) not necessarily zero and 4>(x) is a continuously differentiable nonnegative real 
function. As usual, the function (f)(x) is generally supposed to be convex, but here we 
don't assume that <fi(x) is convex. 

Following Kagan ^2] and Ferentinos and Papaioannou [S], we define 



f{x, 9 + if*) + f(x, 9 + t ej 



(38) 7£(0)=limiM-ji^ 
Then the Csiszar parametric matrix is given by 

(39) &)=pm\L k , 

where 9 + tej, 9 + tej G 6, i, j = 1, 2, k and ei(l, 0, 0), e 2 (0, 1, 0), 
are the unit vectors. 



e fc (0,0,..,l) 



Suppose the following conditions hold: 



( c ) / a$e;f( x > 9 ) d » 



< oo for all 9 G 6 and i, j = 1,2,..., k. 



(d) The third order partial derivative of f(x, 9) with respect to 9 exists for all 9 G © 
and x G X. 



Based on the above considerations the following theorem holds. 
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Theorem 3. If the conditions (a)-(d) are satisfied, then for all 9 G O, we have 



^ a f Ix{Q)i @ i s univariate 



(40) 



where 



with 



2 



[S%(6) + I$(6)] , if 9 is k — variate 



1 



S5(0) = - [M F X {9) + M F X {9) T ] 



Mm 



and 



im = Eo 



d_ 

89, 



■ im 

■ im 

■ &) 

\nf(x, 9) 



r« W Wkxk 



This result has been derived by Aggarwal jl.. Similar results derived for the Renyi, 
Kagan, Kullback-Leibler, Matusita measures of information can be seen in Kagan [12 , 
Aggarwal [T], Boekee Ferentinos and Papaioannou |H], Taneja [20J, Salicru and Taneja 
HU, etc. 

6. Unified (r, s)-T-Divergence and Fisher Information Matrix 

To get the relationship between unified (r, s)-T- divergence and Fisher information ma- 
trix, first we give the following proposition due to Salicru and Taneja [T7| . 

Proposition 3. Let 

Gj(/ 1 ||/ 2 ) = /i(J^(/ 1 ||/ 2 )-0(l)) > 

where h is a continuous differentiable real function with h(0) = 0, and I/2) is given 

by \3l\) . Suppose the conditions (a) -(d) are satisfied. Then for 9 6 0, we have 

G\ {&)) = tiwrn, 

where 

Gimo)) = \K( i m)\L k 

with 



Gl (&)) 

= Jiminf i ft (k. ( /(l ,,)||/M±M±iM±M 
and I x (9) is the Csiszdr information matrix given in h39) . 
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Now we shall apply the above results to connect the measures (|15|) and (J2IJ) with 
Fisher measure of information. 

Proposition 4. If the conditions (a)-(d) are satisfied, then for all 8 G O, we have 

(41) 1 V{d) = r -[S F x {6) + im], 

(42) 2 V(0) = l[Sm + Ix(0)] 
and 

(43) xr s {e) = l -[s F x {e) + i F x {e)]. 

Proof. We shall prove for each part separately. 
We can write 

'Will/a) = & OMAii/2) - 0i(l)) + {KMWh) ~ 02(1)) , 

where 

fl + xY 
cf) X (x) = x I j , r^l, r > 

2 (^) = ( — 2~ ) ' r ^ x > r > 

and 

%) = [2(s - l)]" 1 [(x + l)^i - ll , r ^ 1, a ^ 1, r > 0. 

This gives 

m = m = r -^^ 

and 

h'(0) = [2(r- r ^ 1, r > 

We have 



1 W = ^W+*)],^l,r>0 



and consequently, 



for all 6* G 0, < r < 00 and —00 < r < 00. 
Again, we can write 

a Wi||/ a ) = fc(^(/ 1 ||/2)-0 1 (i)), 

where 

= ( ( ^T^l , r ^ 1, r > 
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and 

h(x) = (s- ly 1 (x + 1)^ - 1 , r ^ 1, s ^ 1, r > 0. 

This gives 

0"(l) = rfr " 1) 



4 
and 

ft'(0) = (r- l)" 1 , r ^ 1, r > 

We have 

2 ^) = ^[^) + /M,r^l,r>0 

and consequently, 

2 V(o) = {filsZ(o) + iZ(o)], 

for all 9 G 0, < r < oo and — oo < s < oo. 

It is easy to check that (j)j Ts (l) = | for s G (— oo, oo). This gives 

for all 9 G 6. □ 
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